Journal of General Internal Medicine
○ Springer Science and Business Media LLC
Preprints posted in the last 7 days, ranked by how well they match Journal of General Internal Medicine's content profile, based on 20 papers previously published here. The average preprint has a 0.04% match score for this journal, so anything above that is already an above-average fit.
Park, A.; Yin, L.; Wong, A.; Lee, C.; Choi, Y.
Show abstract
Medical discrimination may alter how patients relate to health information sources following adverse care encounters. We examined whether discrimination experience is associated with selective erosion of institutional health trust and with compensatory digital health engagement, using nationally representative data from the Health Information National Trends Survey (HINTS) 6 (2022; n=6,252) and HINTS 7 (2024; n=7,278). Survey-weighted modified Poisson regression estimated prevalence ratios (PRs) for binary high-trust outcomes, and survey-weighted ordinary least squares estimated coefficients for continuous outcomes; jackknife replicate weights (50 replicates) provided variance estimates. Discrimination was associated with substantially lower probability of high trust in the healthcare system (PR=0.39; 95% CI 0.30-0.52) and physicians (PR=0.85; 95% CI 0.77-0.94), with no significant association for trust in scientists, government, family, or religious organisations. The clinical-institutional pattern replicated in HINTS 6, which additionally showed reduced trust in scientists for race/ethnicity-based discrimination. Contrary to a disengagement hypothesis, discrimination-exposed adults showed higher probability of online health information seeking (PR=1.06), health app use (PR=1.11), and online provider messaging (PR=1.13); these associations persisted after adjustment for trust in physicians. Discrimination was independently associated with lower health self-efficacy (b=-0.271). Medical discrimination selectively erodes trust in clinical institutions while leaving broader epistemic trust largely intact. Despite this, discrimination-exposed patients engage more actively with digital health channels, consistent with compensatory reorientation toward non-clinical information sources. These findings describe engaged but institutionally alienated patients, with implications for restoring clinical trust and for equity-centred digital health design.
Squire, K.
Show abstract
Background. The emergency department in the United States of America functions as a residual access point for healthcare and social services for populations including rural communities, the uninsured, mental health and addiction patients, and the unhoused. The workforce variable that determines unit function (experience density, the concentration of accumulated clinical judgment within a unit workforce) is not measured in hospital accounting systems. Objective. To document workforce composition changes in U.S. emergency nursing across the 2018 and 2022 cycles of the National Sample Survey of Registered Nurses (NSSRN), and to specify falsifiable predictions for the 2026 cycle. Methods. We analyzed NSSRN public-use files using a four-way ED definition extending Castner et al. (2024) and a hospital-bedside-restricted comparator. Variance estimation used jackknife replicate weights for 2018 and Successive Differences Replication for 2022. Burnout was operationalized using the Norful et al. (2023) leaving-reasons proxy across cycles, with sensitivity analysis using the 2022 direct burnout item. Results. A 15-year trajectory (2008-2022) documents progressive experience-density compression: the ED's 15+ year veteran cohort fell from 41.9% to 28.0% over the decade preceding the pandemic, a loss of nearly a third of the senior cohort and a 19.6% decline in mean experience density, before recovering modestly to 33.3% as veteran nurses remained through the pandemic acute phase, leaving the ED as the youngest hospital setting throughout. Hospital non-ED bedside nurses lost senior tenure between cycles (mean 15.65[->]14.06 years since first licensure; 15+ year share 43.5%[->]38.7%), while ED nurses retained their senior tail (mean 11.60[->]12.58). Burnout endorsement rose sharply in both populations (non-ED 27.3%[->]46.0%; ED 34.2%[->]61.2%), with the ED-vs-non-ED gap more than doubling. Controlling for tenure, ED status was not independently associated with burnout in 2018 (OR 1.15, 95% CI 0.83-1.59) but was strongly associated in 2022 (OR 1.92, 95% CI 1.44-2.55; p<.001). The direct burnout item showed a parallel pattern (OR 2.92, 95% CI 1.62-5.28). Conclusions. A pandemic-era setting-specific burnout effect emerged in emergency nursing that workforce-composition controls cannot explain. The 2022 cycle establishes a pre-exit baseline against which the 2026 NSSRN will serve as the falsifiable test of post-Omicron veteran exit. Nursing pipeline replacement lag exceeds the interval before 2026 data arrives; the consequences of inaction fall on populations dependent on ED-based residual access.
Landry, T. C.; Kim, Y.
Show abstract
Background. Capillary refill time is a resuscitation target in septic shock,1-4 but bedside measurement is examiner-dependent. An ICU monitor co-records a photoplethysmogram on the pulse oximeter and intermittent noninvasive blood pressure cuff cycles; if the probe and the cuff share a limb, each cycle is an unplanned vascular occlusion test on the distal microvascular bed. Standard practice places the two on opposite limbs. Objective. To measure how often, in MIMIC-IV-WDB v0.1.0, charted cuff cycles show the photoplethysmographic morphology expected of a same-limb cuff and probe, and to characterize the candidate capillary refill-like signal when that morphology is present. Methods. MIMIC-IV-WDB v0.1.05 was linked to the MIMIC-IV clinical database.6 A pre-registered rule-based detector identified candidate occlusion-reperfusion signatures on the 1-Hz perfusion-index envelope around each charted cuff timestamp. The primary endpoint was the proportion of cuff cycles suitable for analysis that were detector-positive at a 15-second reperfusion threshold, with 95% confidence intervals estimated by resampling patients at a fixed seed. A secondary analysis used a locally hosted multimodal language model (a Gemma-3 derivative on a non-device server) to adjudicate the same signature on perfusion-index plots; no MIMIC-IV-WDB content left the workstation. Results. Of 9,224 charted cuff cycles, 8,909 had a usable pulse-oximeter waveform, and 268 cycles in 15 patients (4.30% of the 6,236 cuff cycles suitable for analysis, 95% CI 2.60 to 6.03) met the primary 15-second threshold. The language model adjudicated the same cycles and called 1,367 of the 8,909 cycles with a usable waveform (15.34%) signature-present, roughly five times the detectors count. Because no laterality ground truth exists, agreement with a single blinded reader served as the comparator rather than accuracy. The two methods were about equally concordant with the reader: precision was 0.25 (95% CI 0.14 to 0.39) for the detector and 0.24 (95% CI 0.10 to 0.35) for the language model, although reweighting to the full population of cycles with a usable waveform lowered the language model to 0.030 (95% CI 0.009 to 0.053). These estimates are reference-limited: a blinded re-read of a 150-card subsample showed only moderate intra-rater reliability (Cohen {kappa} 0.46 to 0.59) with systematic undercalling on the first pass, and rescoring against the corrected re-read roughly doubled precision for both methods. Conclusions. Opportunistic extraction of capillary refill-like signals from archived ICU pulse oximetry is limited in two distinct ways. First, sensor geometry limits how often the signal is recordable: cuff cycles rarely show the morphology expected of a same-limb cuff and probe pair, consistent with opposite-limb placement, so the bottleneck is geometry rather than signal processing. Second, the modest reliability of morphology adjudication limits how well any single flagged cycle can be confirmed: against a blinded reader the detector is a usable screen but a noisy confirmer, the reference is itself only moderately reliable, and the language model is no more concordant despite flagging many more cycles. The minority of cycles in which the morphology appears contain a candidate signal that may merit prospective study under controlled placement with laterality recorded.
Gharibyan, I.; Ahner, E.; Shao, R.; Sharma, D.; Navarsartian Tazehkand, T.; Diep, J.; Assoumou, B.
Show abstract
Background: Statins are key to preventing atherosclerotic cardiovascular disease and lowering low-density lipoprotein cholesterol and cardiovascular events. However, skepticism regarding their safety and value persists and is increasingly influenced by social media. TikTok has emerged as a major source of health information, but its content varies in quality and accuracy. This study evaluated the quality, attitudes, misinformation, and engagement of statin-related content on TikTok. Methods: Public TikTok videos were collected using predefined search terms and coded by creator type, thematic content, and overall attitude. Video quality was assessed using the DISCERN instrument, the Patient Education Materials Assessment Tool for Audiovisual Materials, and the Global Quality Score. False or misleading claims were independently reviewed by two cardiology fellows. Associations between engagement and quality were also examined. Results: Of 1,349 screened videos, 258 met inclusion criteria. Most were educational (91.0%), with non-physician healthcare providers (34.5%) as the largest creator group. Risks or negative effects were discussed more often than benefits (63.2% vs 42.2%), and 39.5% contained at least one false or misleading claim, most often from complementary and alternative medicine providers and wellness promoters. Quality differed by creator type across all instruments, with physician-created content scoring highest. Video popularity showed minimal association with informational quality. Conclusion: Statin-related TikTok content frequently emphasizes harms, often contains misinformation, and varies substantially in quality by creator type. Greater involvement of healthcare professionals on social media may help improve digital health literacy and counter misleading information about statin therapy.
Jayaprakash, A.; Liberati, E.; Lindsay, R.; Willars, J.; Gibson, J.; Fritz, Z.; Price, A.; Hatfield, T.; Richards, N.; Martin, G.
Show abstract
Objectives People with mental health conditions experience increased rates of diagnostic errors and delays in acute treatment. While causes such as diagnostic overshadowing (misattribution of physical symptoms to mental health conditions) are well documented, less attention has been paid to the organisational and structural conditions that shape diagnostic work. This study examines how physical illness is diagnosed in patients with mental health conditions in emergency departments (EDs), with a focus on the structural conditions that enable or constrain safe diagnostic practice. Method We conducted a multi-site ethnography across three purposively selected EDs in England between April 2023 and April 2024, varying in size, population demographics, and local service configuration. Data were collected through 284 hours of non-participant observation and 20 semi-structured interviews with ED staff. Results Our analysis identified four recurring structural gaps that shaped the conditions under which physical health diagnosis took place for patients with mental health conditions: a design gap, whereby targets and physical layouts constrained diagnostic reasoning; a preparedness gap, reflecting the lack of structural support to allow staff to act on their existing knowledge and skills; a coordination gap, reflecting fragmented ownership and the challenges of joint assessment across mental and physical healthcare teams; and an expectation gap, whereby unmet need elsewhere in the system increased demand for ED services that were beyond its formal scope. These gaps made diagnostic errors and delay more likely for patients with mental health conditions seeking physical healthcare in the ED. Conclusions As new dedicated mental health EDs are introduced in England, there is an opportunity to avoid reproducing these structural gaps in new settings. Our study suggests that improving physical healthcare for patients with mental health conditions requires changes to how EDs are designed, resourced and supported, and how they connect with the wider health and care system. Keywords: mental health, diagnostic inequality, emergency departments
Osborne, T.; Mahmud, T.; Zheng, X.; Jampala, S.; Abbasi, S.; Hong, S.; Kranz, K.; Lee, S.; Ng, P.; Odekon, K.; Schachter, L.; Sexton, R.; Spinnato, T.; Tharakan, M.; Wu, Z.; Wang, F.; Wong, R.
Show abstract
Although large language models (LLMs) have shown promise for discharge summary generation, their value may be greater in longer hospitalizations, where increasing documentation volume and complexity increase both clinician burden and the risk of communication failures during transitions of care. Prior evaluations of LLM-generated discharge summaries have largely involved shorter stays and have rarely examined receiving-clinician priorities or incidental finding reporting. We compared LLM-generated and human-authored discharge summaries for 60 Internal Medicine hospitalizations lasting 7 to 21 days, with paired assessment by hospitalists and primary care physicians (PCPs). Clinician reviewers preferred LLM-generated summaries for 95% of encounters and rated them higher for quality, readability, factuality and completeness. PCPs, the primary recipients responsible for post-discharge care, found that LLM-generated summaries were better for understanding and communicating hospital care to patients, and providing follow-up care. LLM-generated summaries had fewer annotated errors, primarily due to fewer omissions, without increased estimated harm potential or likelihood compared with human-authored summaries. Benefits of LLM-generated summaries were especially salient for PCPs, who identified more omissions with greater downstream likelihood of harm than hospitalists. This underscores the importance of designing transition documents around the needs of clinicians assuming care post-discharge. LLM identification of radiology incidental findings was generally accurate and appropriate, suggesting potential to improve follow-up of clinically relevant findings. These findings extend prior work by demonstrating clinical value of LLMs in summarizing longer, complex hospitalizations and highlighting the value of stakeholder-centered design in clinical AI systems. Together, they support supervised LLM-assisted discharge summarization as a tool to reduce cognitive burden, improve documentation quality, and enhance transition-of-care communication.
Collier, A.
Show abstract
Background Electronic health record documentation patterns may reflect workflow complexity, monitoring intensity, and operational strain in intensive care settings. However, documentation-derived features can be sensitive to local documentation culture, data capture systems, and outcome definitions. Retrospective validation across multiple datasets is therefore needed before these signals are used in workflow intelligence or clinical AI governance tools. Objective To evaluate whether documentation-density and documentation-timing features show reproducible retrospective signal for ICU workflow complexity and long-stay proxy outcomes across de-identified critical care datasets, while distinguishing workflow and long-stay associations from unsupported claims about mortality prediction, burden reduction, or deployment readiness. Methods We synthesized retrospective validation results from de-identified ICU and workflow datasets generated through a prespecified documentation-density validation program. Feature families included Documentation Burden Score style features, Shift-End Documentation Rate style features, documentation reliability style metadata, and all-documentation feature sets where available. Outcomes included long ICU length of stay proxies, mortality where available, and workflow proxy endpoints. Models compared baseline feature sets with enhanced models containing documentation-density or workflow features. Performance was summarized using area under the receiver operating characteristic curve, Brier score where reported, delta AUROC, bootstrap confidence intervals where reported, and label-shuffle controls where available. Results The strongest external long-stay proxy evidence came from the NWICU chartevents analysis, which included 28,612 ICU stays, 20,267 stays with chart events, and 9,619,759 chart events. For ICU length of stay greater than the median, baseline AUROC was 0.5252. Enhanced AUROC was 0.9512 for Documentation Burden Score features, 0.9214 for Shift-End Documentation Rate features, 0.8470 for documentation reliability style features, and 0.9517 for all documentation features. Corresponding label-shuffle enhanced AUROCs were near random, ranging from 0.4897 to 0.5064. For ICU length of stay greater than the 75th percentile, baseline AUROC was 0.5155. Enhanced AUROC was 0.9433 for Documentation Burden Score features, 0.9194 for Shift-End Documentation Rate features, 0.8118 for documentation reliability style features, and 0.9427 for all documentation features, with label-shuffle enhanced AUROCs from 0.4836 to 0.4999. Additional retrospective support was observed in eICU workflow analyses, HiRID first-24-hour documentation-density analyses, MIMIC-IV HF ICU internal analyses, MIMIC-IV-Note metadata extensions, and nursing-chart or lab density proxy analyses. However, cross-institution discrimination transfer was weak without recalibration, and several analyses remained proxy validations rather than final clinical validations. Conclusions Documentation-density and documentation-timing features show promising retrospective signal for ICU workflow complexity and long-stay proxy outcomes, especially in NWICU chartevents and selected internal dataset-specific analyses. These findings support further preregistered, prospective, silent-mode validation of documentation-derived workflow intelligence. They do not establish prospective clinical performance, mortality reduction, clinician burden reduction, autonomous deterioration prediction, or deployment readiness.
Faux-Nightingale, A.; Woodcock, C.; Walker, C.; Smith, H. E.; Welsh, V. K.
Show abstract
Background Chronic pain is common in adults aged 85 years and older (85+) and is associated with detrimental outcomes. Chronic pain guidelines advise first line management with non-pharmacological measures; paracetamol and non-steroidal anti-inflammatory drugs are the preferred analgesics. Challenges in accessing non-pharmacological therapies for adults aged 85+, and the presence of multimorbidity and polypharmacy, mean that opioid medication is often prescribed for chronic pain despite the potential for opioid-related adverse effects and guidance identifying long-term opioids for chronic pain as a potentially inappropriate prescription. Aim This study aims to explore patient, caregiver, and healthcare professional perspectives on the prescription of opioid medications for pain management for chronic pain in adults aged 85+ to support development of resources for optimising opioid prescribing. Design and Setting In this qualitative study, participants were recruited through primary care, in the community or in care home settings. Method 36 semi-structured interviews were conducted with care home residents and community dwellers aged 85+ (n=12), caregivers (informal and care home staff) (n=12), and healthcare professionals (n=12). Interviews were transcribed and analysed using reflexive thematic analysis. Results Four themes were developed: contextual complexity, satellite influences, balancing act, and pragmatic prescribing. Using opioids in adults aged 85+ is a balancing act to support patients best possible quality of life within their unique circumstances whilst using the pain management tools available. Conclusion Opioids continue to have an important role in pain management in adults aged 85+ largely due to paucity of alternatives and the drive to support quality of life.
Ramzy, L. M.; Rahman, M.; Luque, M. O.; Rodrigues, K. K.; Belknap, R.; Venci, J. A.; Francis, B.; Ruckard, B. J.; Moran-Ibarra, W.; Rasulo, R. M.; Matadi, A.; Ramirez, M. G.; Thee, P. S.; McFeron, H. D.; Monson, S. P.; For the Tuberculosis Epidemiologic Studies Consortium,
Show abstract
Purpose: The purpose of this study was to examine the barriers and facilitators experienced by non-U.S. born persons during the diagnosis and treatment of latent tuberculosis infection (LTBI) in primary care settings, including the impact of culturally and linguistically congruent care navigation. Design: 25 interviews with non-U.S. born patients, along with focus groups and surveys with 31 primary care team members and leadership, were conducted. Setting: The study was conducted within a network of Federally Qualified Health Center (FQHC) clinics. Participants: Participants were adult non-U.S. born patients with LTBI and FQHC care team members. A purposefully selected subsample of randomized participants was interviewed. Intervention: Care navigators followed participants randomized to receive care navigation after a positive test for tuberculosis (TB) infection and offered health navigation and education about the importance of TB screening and treatment. Method: Data collection was followed by thematic analysis guided by a critical ideological paradigm. Results: Culturally and linguistically congruent navigation emerged as central to potentially reducing barriers, fostering trust, and improving treatment continuity. Participants without navigation support reported confusion and disengagement from care, while those with culturally aligned navigators described clarity and comfort, with influence overall by intrinsic motivation, relational support, and culturally shaped beliefs about care. Conclusion: Care navigation that includes culturally and linguistically congruent navigators whenever possible may help increase LTBI treatment completion among non-U.S. born populations. Limitations of the study include the potential influence of cultural norms, power dynamics, and selection bias.
Xia, J.; Zhu, Z.; Zhang, G.; Shen, Q.; Su, E.; Schoones, J.; Arcelus, J.; Hu, T.; Xu, M.; Zhang, X.; Zhao, Z.; Ye, Z.; Yao, X.
Show abstract
Introduction: Trans and gender-diverse (TGD) individuals often face stigma and discrimination in healthcare, hindering access to gender-affirming care. Training healthcare workers on TGD health aims to foster inclusive and affirming care practices. This review aimed to evaluate the effectiveness of TGD health training programs for healthcare workers. Methods: This systematic review followed the PRISMA guidelines and was registered with PROSPERO (CRD42023443288). We searched 13 databases for studies up to March 2024, with no language/geographic restrictions. Ten reviewers screened studies in pairs, resolving discrepancies via discussion or third-reviewer input. We included randomized/non-randomized comparative and before-after studies for quantitative analysis (mean difference [MD] or standardized mean difference [SMD] with 95% CIs) and qualitative/mixed-methods studies for thematic synthesis. Evidence certainty was assessed using GRADE (quantitative) and GRADE-CERQual (qualitative). Outcomes included knowledge, attitudes, skills, discrimination, competence, comfort, TGD quality of life, and stakeholder preferences. Results: From 20,188 records, 85 studies were included. Training appears to have improved healthcare workers' knowledge (SMD=1.08, 95% CI 0.78-1.39), attitudes (SMD=0.22, 95% CI 0.05-0.39), skills (SMD=0.96, 95% CI 0.56-1.37), competence (SMD=0.55, 95% CI 0.29-0.81), and comfort (SMD=0.69, 95% CI 0.17-1.21). Qualitative analysis of 130 findings identified 18 categories and four key themes on intervention design and impact. Conclusions: TGD training programs may enhance health workers' knowledge, attitudes, skills, competence, and comfort. Well-structured, interactive, and inclusive programs showed promise, but evidence certainty was low with limited follow-up. Further high-quality research is needed to confirm these findings.
Doan, L. V.; Hung, A. M.; Olfson, M.; Williams, N. T.; Rudolph, K. E.
Show abstract
Introduction: Acute low back pain is a leading cause of disability worldwide. Clinical guidelines recommend non-pharmacological therapies as first-line treatment and advise caution with opioid prescribing. However pharmacological therapies, including opioids and gabapentinoids, remain commonly used. The comparative risks of subsequent opioid use disorder (OUD) and overdose diagnosis associated with initial treatment modality in large, real-world populations is not well characterized. We estimated the incidence of new-onset OUD and overdose diagnosis among opioid-naive, Medicaid-insured adults with newly diagnosed acute low back pain and estimated the association between initial treatment modalities and subsequent OUD and overdose diagnosis risk. Methods: We conducted a retrospective cohort study using Medicaid T-MSIS Analytic files from 25 states (2016-2019). We identified opioid-naive adults with a new diagnosis of acute low back pain who initiated pharmacologic or non-pharmacologic treatment within 1 month of diagnosis. The primary outcome was incident OUD and overdose diagnosis (based on diagnosis codes in claims) during follow-up. Associations between initial treatment modality and OUD and overdose diagnosis risk were estimated using a non-parametric, doubly robust estimator to adjust for measured confounding. Results: The cohort included 525,002 opioid-naive adults initiating treatment for low back pain. The cumulative incidence of OUD and overdose diagnosis was 1.5% and 2.4% at 7 and 13 months, respectively. Compared to non-use, use of gabapentinoids during the first month of treatment was associated with the highest relative risk (increasing risk) by 130.1%, 95% confidence interval (CI): 117.8%, 142.3%), the second-highest relative risk was estimated for higher-dose opioids, defined as > 50 daily Morphine Milligram Equivalents (MME) (118.1%, 95% CI: 99.2%, 137.0%). Lower-dose, short-duration opioids ([≤] 50 MME, [≤] 7 days) were also associated with elevated risk, though substantially smaller in magnitude (20.8%, 95% CI: 13.8%, 27.9%). In contrast, non-pharmacologic, non-interventional therapies were associated with reduced OUD and overdose diagnosis risk, with physical therapy demonstrating the largest relative reduction of 34.0% (95% CI: -40.9%, -27.1%). Discussion: In opioid-naive Medicaid patients with acute low back pain, initial non-pharmacologic treatment was associated with reduced OUD and overdose diagnosis risk. Gabapentinoids and opioids were each associated with increased risk; for opioids, the degree of risk increased with higher doses and durations. These results support guideline recommendations favoring non-pharmacologic treatment as first-line therapy and indicate the importance of cautious prescribing when pharmacologic treatment is considered.
Belouali, A.; Kitchen, C.; Haroz, E.; Lehmann, H.; Nestadt, P. S.; Wilcox, H. C.; Kharrazi, H.
Show abstract
Background: Most approaches to suicide risk assessment consider clinical conditions as independent risk factors, potentially overlooking prognostic information in the order in which conditions accumulate. We applied temporal sequence mining to linked claims and mortality data to identify ordered clinical diagnostic trajectories associated with suicide death. Results: The cohort included 3 647 059 insured Maryland residents aged 10 years or older with available claims records in the Maryland Suicide Data Warehouse from January 1, 2016, to December 31, 2020, among whom 768 suicide deaths were ascertained through medical examiner linkage. Sequential pattern mining of ICD-10-CM diagnoses grouped into Clinical Classifications Software Refined categories identified 89 221 candidate sequences, of which 1 816 remained significantly associated with suicide death in time-varying Cox models. Adjusted hazard ratios (AHRs) ranged from 2.4 to 134.1. Two-thirds of significant trajectories ended in physical conditions, and approximately half crossed from psychiatric to physical endpoints. Among suicide decedents, 62% were exposed to at least 1 significant sequence (median, 16 per case); median sequence duration was 18.7 months, and median time from completion to death was 13.1 months. In landmark analyses, among patients with depression who later developed suicidal ideation (n = 26 356), the path through anxiety, then anemia, was associated with higher risk (AHR, 4.6; 95% CI, 2.2-9.5), whereas the anxiety-only path was not (AHR, 1.3; 95% CI, 0.8-2.1). Among patients with anxiety who later developed hypertension (n = 149 215), the path through history of self-harm was associated with higher risk (AHR, 32.0; 95% CI, 16.6-61.6). Associations were generally consistent across sex and age. Conclusions: Temporal ordering of clinical conditions may carry prognostic information for suicide death. Clinical trajectories incorporating physical illness within psychiatric sequences identified higher-risk groups. These findings suggest that opportunities for risk detection may extend beyond psychiatric settings and that suicide risk signals may be fragmented across care settings and not apparent within isolated encounters.
Shah, K. P.; Airan Javia, S.; Savage, T.; Bressman, E.
Show abstract
End-of-rotation handoffs are critical for patient safety but add to documentation burden for hospitalists. Generative artificial intelligence (AI) may help automate handoff creation using electronic health record data, but its impact on quality and safety is unclear. Methods: We developed an AI handoff tool with a large language model using clinical notes as input and conducted a retrospective evaluation comparing AI-generated and clinician-authored handoffs. Handoffs were assessed across domains of quality and safety through a structured review. Results: Quality ratings were similar between AI and human handoffs (3.7 vs. 3.5, p=0.57). AI-generated handoffs were rated higher for organization (4.4 vs. 4.1, p=0.05) and completeness (4.1 vs. 3.6, p=0.01), but lower for conciseness (3.7 vs. 4.1, p=0.03) and accuracy (4.1 vs. 4.4, p=0.03). Error rates were comparable (0.3/handoff in both groups); however, AI-generated handoffs included inaccuracies (9% of AI errors) and hallucinations (1% of AI errors), while clinician-authored handoffs contained only omissions. Conclusion: Human and AI handoffs have differing error profiles and tradeoffs between completeness and conciseness. Prospective evaluation in clinical workflows is underway.
Pears, M.; Wadhwa, K.; Payne, S. R.; Konstantinidis, S. T. H.; Biyani, C. S.
Show abstract
Large language models (LLMs) such as ChatGPT are rapidly reshaping healthcare education and simulation-based training in non-technical skills (NTS), yet no bibliometric analysis has mapped this landscape. We searched seven open-access databases (OpenAlex, PubMed, Europe PMC, Crossref, Semantic Scholar, CORE, DOAJ) for English-language publications from January 2020 to March 2026. From 100,277 initial records, a sequential keyword funnel yielded 830 candidate papers, which were screened by 83 independent Claude Sonnet 4.6 AI agents applying pre-specified inclusion criteria (PRISMA-trAIce compliant; Cohen's kappa = 0.86 pre-reconciliation, 1.0 post-reconciliation). The final AI-verified corpus comprised 551 papers with a compound annual growth rate of 109%, contributions from 2,398 authors across 279 journals in 58 countries, and an h-index of 41. ChatGPT dominated the model landscape (46% of papers), with open-source models virtually absent. Virtual patient chatbots were the leading simulation modality (106 papers). Among NTS domains, communication (145 papers) and decision-making (135 papers) were most studied, whereas teamwork, leadership, situational awareness, and crisis resource management were markedly underrepresented. Only 6 urology-relevant papers were identified, none examining LLM integration within boot camp training formats. The field is growing at extraordinary pace but remains concentrated in a narrow range of NTS domains and a single proprietary model. Critical gaps persist in team-based skills training, open-source model evaluation, and specialty-specific simulation. AI-assisted bibliometric screening using multiple independent agents is feasible, reliable, and scalable, offering a replicable methodology for mapping fast-evolving research fields.
Savic, L.; Dias, P.; Vairale, J.; Begum, S.; Khan, K.; Fowler, A. J.; Kaura, V.; Watson, S.-L.; Littlejohns, A.; Pearse, R. M.; Abbott, T. E. F.
Show abstract
Background One in four surgical patients carries a drug allergy label, of which an estimated 90% are incorrect. Avoidance of first-choice drug therapies may lead to worse postoperative outcomes. We sought to determine the nature and extent of any association between drug allergy labels and postoperative complications. Methods A multicentre observational study in 21 NHS hospitals. Eligible patients were 18 years or older, undergoing common surgical procedures: primary hip or knee replacement; internal fixation of closed long bone fracture; colorectal resection; trans-urethral resection of prostate or bladder tumour; caesarean section; hysterectomy. Exclusion criteria: use of antibiotics in the two weeks prior to surgery, previous participation in the study. Primary outcome was postoperative complications within 30 days following surgery, a composite outcome comprising: all postoperative infections, anastomotic leak, acute respiratory distress syndrome, myocardial infarction, postoperative bleed, pulmonary embolism, stroke, antimicrobial side effects, death. Results Among 13,646 patients, 3924 (29%) carried greater than or equal to1 drug allergy labels. Labelled patients were more likely to develop postoperative complications (989/3924 (25%) vs 1926/9722 (20%); OR 1.21 [1.10-1.34]; p<0.001). They were more likely to develop surgical site infections (337/3924 (9%) vs 760/9722 (8%); OR 1.19 [1.03 -1.38]; p<0.018), and any postoperative infection (750/3924 (19%) vs 1472/9722 (15%); OR 1.24 [1.11-1.38] p<0.001). Labelled patients experienced increased risk of allergic drug reactions (31/3924 (0.01%) vs 29/9722 (<0.01%); OR 3.00 [1.77-5.09]; p<0.001), but no increase in mortality. Conclusions Drug allergy labels are common, but often incorrect. Labelled patients experience worse postoperative outcomes, including infective and non-infective complications and increased risk of allergic drug reactions. Trial registration Registered with ISRCTN registry, ISRCTN15775657.
Bergson, Z.; Vassall, S. G.; Wright, A.; McCoy, A. B.; Schafer, K. M.; Achee, M. C.; Sheffield, J. M.
Show abstract
Background: Concerns about "AI psychosis" have swirled in the media since ChatGPT's release, but few systematic analyses exist. We therefore conducted an electronic health record (EHR) analysis to identify the frequency, clinical characteristics, and quality of AI interactions in patients experiencing psychosis treated in a medical center. Methods: AI keywords (e.g., ChatGPT, AI) were used to search Vanderbilt University Medical Center's EHR from 12/1/2022-4/1/2026. Records were discarded if they were not AI-related or if the primary diagnosis did not include psychosis. Three raters read notes to determine if a patient was experiencing AI psychosis and classified the interactions using 4 a-priori categories (Catalyst, Amplifier, Co-Author, Object) formulated to explain how AI-related negative outcomes emerge. Findings: 73 patients met our criteria. 28 patients were rated as experiencing AI psychosis, 17 had neutral interactions, and 28 expressed delusional content related to AI without documented evidence of conversational AI use. ChatGPT was the matching keyword for 53.6% patients experiencing AI psychosis. The majority of AI psychosis cases were documented after ChatGPT's "4o" model was released in May 2024. Notably, the AI Psychosis group had significantly more patients experiencing a first psychotic episode (60.7%) compared to the other two groups. Amplifier was the most common (64.3%) qualitative rating in the AI Psychosis group. Interpretation: "AI psychosis" is an infrequent but real phenomenon observed in clinical practice. Most affected patients were experiencing their first psychotic episode and presented with AI psychosis following the release of the more sycophantic GPT-4o. Among the affected patients, AI most often exacerbated an existing condition by reinforcing distorted ideas.
Hartlage, C. S.; Manning, E. R.; Bernard, J.; Vaish, S.; Gray, J.; Young, M.; Pestian, T.; Folger, A. T.; Tachinardi, P.; Mendonca, E. A.; Brokamp, C.
Show abstract
Objective: To evaluate whether a locally hosted open-weight large language model (LLM) can extract documented psychosocial factors from pediatric psychiatric intake notes and apply validated extraction to a large emergency psychiatry cohort. Materials and Methods: We identified emergency department presentations at Cincinnati Children's Hospital Medical Center from January 1, 2016, through December 31, 2024, among patients younger than 18 years with psychiatric billing diagnoses. Using full-text intake notes, gpt-oss:120b classified peer conflict, sleep disruption, and school-related academic, attendance, and disciplinary issues as detected, negated, or indeterminate. Four human raters independently reviewed 50 notes. We compared Fleiss' kappa among humans alone versus humans plus the LLM, assessed repeated-query stability across 50 independent calls per note, and applied the workflow to all eligible notes. Results: Among 37,315 eligible admissions, 22,284 had eligible intake notes; 22,270 produced parseable JSON. In detected-versus-not-detected coding, human-plus-LLM reliability did not differ significantly from human-only reliability across measures (human {kappa} 0.71-0.94; human-plus-LLM {kappa} 0.70-0.93). Stability was associated with human agreement: mean LLM-human agreement increased from 42.6% for classifications with less than 80% stability to 82.7% for classifications with 100% stability (Pearson r = 0.36). Full-cohort extraction showed frequent and overlapping documented factors: sleep disruption was most frequently detected (57.7%), followed by peer conflict (47.2%), academic issues (43.4%), disciplinary issues (43.3%), and attendance issues (16.9%). Discussion: Agreement varied by construct and was strongest when repeated model outputs were stable. Conclusion: Locally hosted open-weight LLMs can support scalable structured extraction of documented psychosocial factors from pediatric psychiatric intake notes after local validation.
Heller, D. J.; Elkersh, Y.; Nonterah, E. A.; Kuwolamo, I.; Horowitz, C. R.; Alvarez, E. E.; Awine, T.; Govindarajulu, U.; Squires, A. P.; Aborigo, R. A.
Show abstract
Introduction: Hypertension is the world's leading cause of death, and depression its leading cause of disability. Control rates for these noncommunicable diseases (NCDs) are low in low and middle-income countries (LMICs). Many LMICs have programs to screen and treat underserved communities for infectious diseases, but evidence to adapt them to treat NCDs is limited. We developed and tested a non-communicable disease program through Ghana's Community-Based Health Planning and Services (CHPS) primary care initiative. Methods: We trained 8 CHPS nurses to diagnose and treat hypertension and depression through door-to-door screening and pharmacotherapy. Physician assistants provided telehealth supervision. We combined this treatment with volunteer counseling to boost medication adherence, improve mood, and change health behaviors. We called the 90-day intervention the CHPS Opportunity for Mentally and Behaviorally Integrated NCD Engagement (COMBINE). Results: We recruited 60 adults from 580 screened: 37 with hypertension (mean blood pressure (BP) of 149/91 mm Hg) and 23 with depression (mean physician health questionnaire (PHQ-9) score of 13.3). After 90 days, 57/60 (95%) completed the intervention: 32/37 (86%) achieved blood pressure control (mean BP 122/75 mm Hg), and 19 of 20 (95%) achieved depression control (mean PHQ-9 score 2.0). After 12 months, 51/60 were retained: 33/37 with hypertension (89%) and 18/23 with depression (78%), with a mean BP of 121/75 and PHQ-9 score of 1.4 respectively. All 51 (100%) achieved disease control at 12 months. 5 persons left by migration and 4 by escalation to higher-level care. Conclusions: The COMBINE model achieved high levels of diagnosis, care retention, and disease control, with minimal adverse events, in a remote setting with limited usual NCD care. This model suggests a novel means to improve the care cascade for these and other noncommunicable diseases through existing non-physician care models in LMICs, warranting further controlled testing at scale.
Ernandez, J.; Najafi, A.; Roehrborn, C. G.; Lerner, L. B.
Show abstract
PURPOSE: As the armamentarium of BPH therapies continues to expand, it remains imperative to maximize patient satisfaction and minimize decisional regret. We sought to determine the impact of time from BPH diagnosis to index treatment on symptom improvement and subsequent procedural events. MATERIALS AND METHODS: We queried the American Urological Association Quality Registry for men [≥] 40 years old with BPH, available IPSS data, and no receipt of prior BPH treatment. Index treatment included medication, surgery, or minimally invasive surgical therapy (MIST). Outcomes included IPSS over 3 years of follow-up, change in percentage of mild lower urinary tract symptoms (LUTS) by 3 months, and time to procedural event. Patients were stratified by time from index diagnosis to treatment by <12 months, 1-3 years, and >3 years. Outcomes were compared across time-to-treatment cohorts with appropriate statistical tests with p < 0.05 as significant. RESULTS: 43,919 patients met criteria with 19,642 pursuing treatments. Patients pursued treatment at comparably lower baseline IPSS compared to prior prospective series. Patients undergoing surgery and MIST had significantly higher baseline IPSS, while medical comorbidities were significantly more common among men initiating pharmacotherapy. Early surgery and MIST were associated with significant improvement in IPSS within 6-12 months and an increase in mild LUTS by 3 months. All forms of early treatment were associated with delayed time to procedural events, including catheterization and fulguration. CONCLUSIONS: Early procedural intervention for BPH is associated with early symptom improvement and delayed time to procedural events among real-world, contemporary practice.
Coscini, N.; Giallo, R.; Grobler, A.; Hiscock, H.; Mulraney, M.; Pope, N.
Show abstract
Objectives To explore caregiver and clinicians perspectives on implementing mental health conversations and supports for caregivers of children with chronic conditions in paediatric outpatient clinics. Specifically, views were sought on (a) screening approaches and measures (phase 1) and (b) how feedback and support could be provided to caregivers experiencing mental health difficulties (phase 2). Methods Caregivers and clinicians from two outpatient clinics (neuromuscular and diabetes) at a tertiary paediatric hospital in Melbourne, Australia participated in online focus groups in July and August 2024. Caregivers were recruited from outpatient clinics and clinicians were recruited via email. Both groups were combined for phase 1 before separating into breakout rooms for phase 2. Two authors conducted reflexive thematic analysis of transcripts using NVivo. Results Sixteen participants (caregivers n = 8; and clinicians n = 8) took part in in two semi-structured focus groups. Analysis generated two overarching domains, each comprising multiple themes. Domain 1, Addressing caregiver mental health, captured themes of overwhelm and invisibility, diverse caregiving roles, and the need for time and resources to support wellbeing conversations. Domain 2, Housing the mental health conversation, encompassed themes of screening preferences, caregiver agency in confidentiality, delivery of feedback, and access to tailored supports. Conclusions Caregivers and clinicians support routine caregiver mental health discussions in paediatric outpatient settings. Caregivers favour screening at diagnosis and key transitions, with clear, and actionable feedback delivered away from the child. Questions about record-keeping warrant further exploration, as do the perspectives of fathers.